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Coronaviruses are positive-strand RNA viruses of extraordinary genetic complexity and diversity. In 
addition to a common set of genes for replicase and structural proteins, each coronavirus may carry 
multiple group-specific genes apparently acquired through relatively recent heterologous recombination 
events. Here we describe an accessory gene, ORF3, unique to canine coronavirus type I (CCoV-I) and 
characterize its product, glycoprotein gp3. Whereas ORF3 is conserved in CCoV-I, only remnants remain 
in CCoV-II and CCoV-II-derived porcine and feline coronaviruses. Our findings provide insight into the 
evolutionary history of coronavirus group la and into the dynamics of gain and loss of accessory genes. 


Coronaviruses (CoVs), enveloped positive-strand RNA 
viruses of human clinical and veterinary relevance, are ex- 
ceptional in terms of genetic complexity and variety. With 
genome sizes of ~30 kb, they are the largest RNA viruses 
known thus far (17, 42). One major contributing factor to 
CoV diversity is high-frequency RNA recombination (1, 28, 
33). New sero- and biotypes have arisen from homologous 
RNA recombination, i.e., the exchange of corresponding 
sequences among related CoVs (3, 21, 25, 27, 29, 43), while 
heterologous RNA recombination events with noncoronavi- 
ral donor RNAs have led to the acquisition of novel genes 
(31, 44, 56). 

All CoVs have a similar genome organization with a com- 
mon set of five genes arranged in a conserved order (10, 12). 
The polymerase gene, occupying the 5’-most 70% of the ge- 
nome, encodes the replicase polyproteins from which up to 16 
mature products are derived as well as an unknown number of 
functional processing intermediates (58). Downstream of the 
polymerase gene and expressed through a 3’-coterminal nested 
set of subgenomic (sg) mRNAs are the genes for the structural 
proteins S, E, M, and N (12). In addition, each CoV may 
possess up to seven group-specific “accessory” genes that are 
also expressed from sg mRNAs (34). In most cases, the func- 
tions of the accessory gene products are not known, and in 
general, they are not essential for replication in cultured cells 
(6, 36, 41, 53-55). Quite the opposite, their expression might 
decrease viral fitness in vitro, and mutants with inactivated 
accessory genes readily become selected during serial passage 
(22, 30, 45, 51). In field strains, however, accessory genes as a 
rule are maintained (13, 22, 43), and their loss—either through 
spontaneous mutation (37) or by design via reversed genet- 
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ics—generally causes loss of virulence in the natural host (11, 
20, 36). 

CoVs can be divided into three main phylogenetic groups 
(16). Canine coronaviruses (CCoVs), common enteric 
pathogens of dogs (8, 47), belong to subgroup la together 
with feline coronaviruses types I and II (FCoV-I and FCoV- 
II, respectively) and transmissible gastroenteritis virus 
(TGEV) of swine (16, 18). Like FCoV, CCoVs occur in two 
serotypes (39), CCoV types I and II (CCoV-I and CCoV-II, 
respectively) sharing ~90% sequence identity in most of 
their genome (A. Lorusso, N. Decaro, C. Buonavoglia, and 
R. J. de Groot, unpublished data). In the coding region for 
the S ectodomain, however, sequence identity is only 56%. 

The evolutionary history of CoV group 1a is not completely 
understood, but it apparently entailed multiple homologous 
and heterologous recombination events. The available data 
suggest that CCoV-I and FCoV-I arose by linear descent from 
a common ancestor and that recombination of CCoV-I with an 
unknown CoV led to acquisition of a new S gene, thus giving 
rise to CCoV-II (Lorusso et al., unpublished). In turn, 
CCoV-II strains donated this S gene and flanking sequences 
in recombinational exchanges to FCoV-I strains, leading to 
the independent emergence of FCoV-II strains (21). TGEV 
also appears to be of CCoV-II origin; in phylogenetic analyses 
of the genomic region downstream of the S gene, TGEV con- 
sistently clusters with extant CCoV-II field strains (7). 

To date, the FCoVs and CCoVs described share the same 
complement of accessory genes, three of which (the “ORF3abc 
cluster”) are located between the S and E genes (10, 14, 24, 46, 
50, 52). The two remaining ones, ORF7a and ORF7b, are 
located at the 3’ end of the genome (9, 24, 50) (Fig. 1a). In 
TGEV and related porcine CoVs, ORF3b is inactivated and 
ORF’7b is lacking. Here we describe a novel functional acces- 
sory gene unique to CCoV-I and discuss the implications of 
our findings for our understanding of the evolutionary history 
of group la CoVs. 

ORF3, a functional accessory gene unique to CCoV-I. Dur- 
ing sequence analysis of CCoV-I variant Elmo/02 (39), we 
discovered, immediately downstream of the S gene, a 624- 
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FIG. 1. Comparative sequence analysis of CCoV-1 ORF3 and its 
product. (a) Schematic representation of the CCoV-I genome organi- 
zation in comparison to those of CCoV-II, FCoV-I, and FCoV-II. 
Open boxes indicate the genes for the replicase polyproteins, structural 
proteins (S, E, M, and N) and accessory proteins (ORF3abc, ORF7a, 
and ORF7b proteins). CCoV-I ORF3 is indicated by a hatched box. 
(b) Conservation of ORF3 among CCoV-I field variants. An alignment 
is shown of ORF3 protein sequences from eight CCoV-I isolates. A 
conserved potential signalase cleavage site between A'* and K*> (http: 
/Awww.cbs.dtu.dk/services/NetNGlyc/) is indicated by a thick black ar- 
row; a potential N-glycosylation site at N'!° is indicated by a lollipop. 
The corresponding ORF3 nucleotide sequences were deposited in 
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nucleotide (nt) open reading frame (ORF3) that is absent in all 
other group la CoVs studied so far. ORF3 is preceded by a 
canonical group 1a transcription-regulating sequence (TRS) 
(9, 24, 26, 48), which suggests that in infected cells it is ex- 
pressed through a separate dedicated sg mRNA species. The 
encoded protein, 207 residues in length with a predicted N- 
terminal signal peptide and a single potential N-glycosylation 
site at Asn'!°, bears no significant sequence identity to any 
viral or cellular protein in the NCBI database (Fig. 1a and b). 

To study whether ORF3 is conserved among CCoV-I 
strains, viral RNA was extracted from fecal samples taken from 
eight dogs with natural CCoV-I infection, and reverse tran- 
scription-PCR was performed with primers designed after re- 
gions in the S gene and ORF3a. Comparative sequence anal- 
ysis of the resulting amplicons showed that ORF3 was present 
and intact in every CCoV-I strain tested, with nucleotide and 
amino acid sequence identities of 91.7 to 99.7% and of 90 to 
100%, respectively (Fig. 1b). Apparently, under field condi- 
tions, CCoV-I strains maintain ORF3. Most nucleotide 
changes are synonymous; the ratio of the rate of nonsynony- 
mous substitutions to the rate of synonymous substitutions (5) 
ranges between 0 and 0.25, which is indicative of purifying 
selection. We interpret the combined observations to suggest 
that the ORF3 product is functional during natural CCoV-I 
infection and that amino acid changes that interfere with its 
function are selected against. 

CCoV-I ORF3 encodes a glycoprotein. CCoV-I cannot be 
propagated in tissue culture (38), impeding analysis of viral 
mRNAs and proteins in the context of the infected cell. To 
study the biochemical properties of the CCoV-I ORF3 product 
experimentally, we performed in vitro translation in the TNT 
coupled rabbit reticulocyte lysates in the presence and absence 
of dog pancreas microsomes (Promega). The assays were car- 
ried out with a series of pTUG-31-based expression plasmids 
containing ORF3 of CCoV Elmo/02 and derivatives thereof, 


GenBank (accession numbers AY528745 through AY528751, AY426983, 
and AY426984). (c) The IGRs between the S and ORF3a genes in 
CCoV-II and derivative viruses contain ORF3 remnants. Shown is a 
nucleotide sequence alignment of the 5’ and 3’ ends of Elmo/02 open 
reading frame and flanking IGRs to the S-ORF3a IGRs of CCoV-II 
strains BGF-10 (GenBank accession number AY342160), 1-71 
(EF056487), Insavel (D13096), CB/05 (DQ112226), and 229/05 (un- 
published data), TGEV strain Purdue (DQ811789), and FCoV-II 
strains 79-1683 (Y13921) and 79-1146 (AY994055). Of the Insavcl and 
1-71 IGRs, an internal region of 71 nt could not be aligned with 
certainty and was omitted from the comparison. Sequences are indi- 
cated by circled numbers as follows: 1, ORF3 TRS; 2, termination 
codon of the S gene; 3, ORF3 initiation codon; 4, ORF3 termination 
codon; 5, ORF3a TRS; 6, ORF3a initiation codon. Note that in 
CCoV-I strains 1-71, Insavcl, and BGF10, there is conservation both 
of the 5’ and 3’ ends of ORF3 and of the IGRs. In CCoV-II strains 
CB/05 and 229/05 as well as in the FCoV-II strains and TGEV, se- 
quence conservation at the 3’ end is limited to the ORF3/3a IGR and 
the 3'-most 5 nucleotides of ORF3 (black arrowhead). Further up- 
stream, the latter viruses can be readily aligned with each other, but 
not with ORF3. (d) Amino acid sequence alignment of the N and C 
termini of the Elmo/02 ORF3 protein with translated IGR sequences 
of CCoV-II strains BGF-10, 1-71, and Insavel, TGEV strain Purdue, 
and FCoV-II strain 79-1683. Numbers indicate amino acid positions in 
the CCoV-I ORF3 protein; termination codons are indicated by as- 
terisks. aa, amino acids. 
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FIG. 2. ORF3 encodes a glycoprotein with a cleavable N-terminal signal sequence. (a) Linear representation of the wild-type ORF3 protein 
and derivatives expressed from pTUG-31-based expression plasmids. The signal sequences of the ORF3 product and of CDS are indicated by 
shading. Signalase cleavage sites (black arrowheads) and potential N-glycosylation sites (white lollipops) are indicated. (b) In vitro translation of 
wild-type (wt) ORF3 and derivatives. Translations were performed either in the absence (—) or presence (+) of dog pancreas microsomes 
(Micros.). Prior to SDS-polyacrylamide gel electrophoresis (PAGE) analysis, translation products were treated with PNGase F (PNG-F) (+) or 
left untreated (—). The positions and masses (in kilodaltons) of proteins from the molecular size markers are shown at the left. The positions of 
the various products are shown at the right as follows: G/SP~, glycosylated product without signal peptide; nG/SP* (CDS), nonglycosylated product 
with CDS signal peptide; nG/SP”, nonglycosylated product without signal peptide; nG/SP* (gp3), nonglycosylated product with gp3 signal peptide. 
(c) Proteinase K protection assay. ORF3 derivative M’—I was translated either in the absence (—) or presence (+) of microsomes (Micros.) and 
treated with proteinase K (Prot. K) (+) or mock treated (—) prior to SDS-PAGE analysis. The positions of the various products are shown on the 
right as described above for panel b; the positions of molecular size standards and their masses (in kilodaltons) are shown on the left. A sample 


from a translation reaction supplemented with water instead of expression plasmid was included as a negative control (m). 


cloned downstream of the T7 RNA polymerase promoter (Fig. 
2a). Translation of the wild-type gene in the absence of micro- 
somes yielded not one but two products with molecular weights 
of 22,000 (22K) and 25K [Fig. 2b, panel ORF3 (wt), leftmost 
lane]. Scrutiny of the Elmo/02 ORF3 sequence revealed an 
AUG at codon position seven (Fig. 2a). Arguing that internal 
initiation of translation at this site might give rise to an addi- 
tional, smaller product, we replaced Met’ by Ile (note that in 
many naturally occurring CCoV-I strains, Ile is found at this 
position [Fig. 1b]). Upon expression of this mutant in the 
absence of microsomes, only a single protein species was 
found. Surprisingly, however, it was not the 25K product but 
the faster migrating 22K product [Fig. 2b, panel ORF3 
(M’—I), leftmost lane]. Apparently, the intact signal peptide 
of the ORF3 protein causes aberrant migration in sodium 
dodecyl sulfate (SDS)-polyacrylamide gels. Indeed, expression 
of a mutant with the ORF3 signal peptide replaced by that of 


CDS yielded a single protein species of 25K [Fig. 2b, panel 
ORF3 (CD5-SP), leftmost lane], in accordance with its calcu- 
lated molecular mass (25.7 kDa). 

Translation of wild-type ORF3 and derivatives in the 
presence of microsomes consistently yielded two additional 
products of 28K and 23K [Fig. 2b, panels ORF3 (wt), ORF3 
(M7-—]), and ORF3 (CD5-SP), middle lanes) that were fully 
protected from digestion by proteinase K (Fig. 2c) and 
hence appeared to be contained within the microsomal lu- 
men. The 28K product is N glycosylated; treatment of trans- 
lation products with endoglycosidase PNGase (Promega) 
resulted in loss of this protein species with a concomitant 
increase in the amount of the 23K product; the latter comi- 
grated with mutant ASP that lacks a signal peptide [Fig. 2b, 
panel ORF3 (ASP)]. The combined findings conclusively 
show that CCoV-I ORF3 codes for a 28K glycoprotein (gp3) 
with a cleavable N-terminal signal sequence; hydrophobicity 
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plots did not reveal additional transmembrane regions (data 
not shown), indicating that gp3 is either a secretory or pe- 
ripheral membrane protein. 

ORF3 remnants in CCoV-II and in CCoV-II-related viruses. 
Comparative sequence analyses suggest that the horizontal 
gene transfer that resulted in the CCoV-I/II split-up was re- 
stricted to the coding sequences for the signal peptide and 
ectodomain of S and should have left ORF3 intact (not 
shown). Under the assumption that CCoV-I represents the 
parental biotype and CCoV-II its recombinant offspring, the 
latter must have lost ORF3 subsequently. Indeed, close inspec- 
tion of the intergenic regions (IGRs) separating the S and 
ORF3a genes in CCoV-II variants (4, 7, 24, 32, 40), in FCoV-II 
strains 79-1146 (NCBI accession number AY994055) and 79- 
1683 (NCBI accession number Y13921) (the S gene and down- 
stream sequences are of CCoV-II origin in this strain; 21), and 
in TGEV (57) revealed remnants of ORF3 and/or its preced- 
ing IGR (Fig. 1c and d). In FCoV- I strains ClJe and Black 
(14, 46), however, there is no trace of ORF3, and the S gene 
and ORF3a are separated by a very short 11-nt IGR that 
except for the TRS bears little similarity to the IGRs preceding 
ORF3 or ORF3a in CCoV. Presumably, ORF3 was acquired 
after type I FCoV and CCoV diverged, although the possibility 
that ORF3 was already present in their common ancestor and 
then lost completely in the FCoV lineage cannot be excluded 
(Fig. 3). In any case, our findings suggest that while gp3 is 
important during CCoV-I infection, it became obsolete in 
CCoV-II. 

Is gp3 advantageous only in combination with CCoV-I S 
protein? The function of gp3 is not known, but based upon its 
biochemical properties, it may act either in the infected cell 
within the compartments of the exocytotic route or in the 
extracellular milieu. Given that apart from ORF3 the main 
difference between CCoV-I and CCoV-II strains lies in the 
type of S proteins they carry, it would seem that the function of 
gp3 is in some way connected to the function of S, Le., gp3 
apparently provides an advantage only in combination with a 
type I spike protein. Conceivably, gp3 may be involved in the 
biogenesis of CCoV-IS or required for S-mediated attachment 
or fusion during entry. There is the alternative possibility, 
however, that gp3 is advantageous to the virus only in certain 
types of host cells or tissues correlating with the cell tropism 
conferred by S. Accumulating evidence suggests that FCoV-I 
and -II, and hence by extension CCoV-I and -II, recognize 
different receptors, which may well translate to a difference in 
host cell preference (2, 15, 23). 

ORF3 would not be the sole example of an accessory gene 
lost after a tropism change. ORF3c is conserved among 
group 1 CoVs, yet it is inactivated in FCoV variants that 
cause feline infectious peritonitis; loss of expression seem- 
ingly correlates with a shift from enteric to systemic infec- 
tion and in host cell tropism from enterocytes to monocytes 
(49). In severe acute respiratory syndrome CoV, loss of nsP8 
may have been the consequence of cross-species transmis- 
sion and adaptation to the human host (19, 35). In the case 
of TGEV, adaptation of CCoV-II to swine apparently was 
accompanied by inactivation of ORF3b and loss of ORF7b. 
Clearly, further studies of CoV accessory proteins are war- 
ranted, as these studies will not only broaden our under- 
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FIG. 3. Hypothetical scenario for the evolution of CoV cluster la. 
(a) Rooted neighbor-joining tree inferred from multiple amino acid 
sequence alignments of the M and N proteins, illustrating the evolu- 
tionary relationships between members of phylogroup 1a. Human CoV 
229e served as an outgroup. Support from bifurcations from 100 boot- 
straps is indicated. PRCoV, porcine respiratory CoV. (b) CCoV-I and 
FCoV-I apparently arose from a common ancestor by linear descent. 
As these viruses diverged, several distinct RNA recombination events 
led to the emergence of CCoV-II, FCoV-H, and TGEV. Details are 
explained in the text. Question marks indicate steps that are not yet 
completely understood. 


standing of coronavirus host adaptation and speciation but 
may also open new avenues to antiviral intervention. 
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